{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 基本方法\n",
    "\n",
    "设输入空间$\\mathcal{X} \\subseteq \\mathbb{R}^n$  为 n 维向量的集合，输出空间为类标记集合$\\mathcal{Y} = \\{c_1,c_2,...,c_K\\}$ .输入为特征向量 $x\\in\\mathcal{X}$ ,，输出为类标记$y\\in\\mathcal{Y}$ 。$X$ 是定义在输入空间$\\mathcal{X}$ 上的随机向量，$Y$是定义在输出空间$\\mathcal{Y}$上的随机变量 。$P(X,Y)$ 是 $X$ 和 $Y$ 的联合概率分布。训练数据集\n",
    "\n",
    "$$\n",
    "T = \\{(x_1,y_1),(x_2,y_2),...,(x_N,y_N)\\}\n",
    "$$\n",
    "\n",
    "由$P(X,Y)$ 独立同分布产生。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**推导：**\n",
    "\n",
    "朴素贝叶斯法通过训练数据集学习联合概率分布 $P(X,Y)$\n",
    "先学习先验概率分布和条件概率分布\n",
    "\n",
    "- 先验概率分布         $P(Y=c_k)，k=1,2,...,K$\n",
    "    \n",
    "  条件概率分布         $P(X=x|Y=c_k)=P(X^{(1)}=x^{(1)},...,X^{(n)}=x^{(n)}|Y=c_k),k=1,2,...K$\n",
    "  \n",
    "  朴素贝叶斯法对条件概率分布作了条件独立性的假设，也就是说各个输入之间相互独立，由$P(AB)=P(A)P(B)$，可得\n",
    "  \n",
    "  $$P(X=x|Y=c_k) = \\prod^n_{j=1} P(X^{(j)}=x^{(j)}|Y=c_k)  $$\n",
    "    \n",
    "因此，$$ P(X,Y) = P(Y=c_k)\\cdot P(X=x|Y=c_k)  $$                           \n",
    "\n",
    "\n",
    "   \n",
    "           "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在进行朴素贝叶斯法分类时，对给定的输入x，通过学习到的模型计算后验概率分布$P(Y=c_k|X=x)$ 将后验概率最大的类作为x的类输出。后验概率计算根据贝叶斯定理进行:\n",
    "\n",
    "$$ P(Y=c_k|X=x) = \\frac{P(X,Y)}{P(X)}$$\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$=\\frac{P(X=x|Y=c_k) P(Y=c_k)}{\\sum\\limits_k P(X=x|Y=c_k) P(Y=c_k) } $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据上面独立公式，可得朴素贝叶斯法分类的基本公式\n",
    "\n",
    "$$P(Y=c_k|X=x) = \\frac{P(Y=c_k)\\prod\\limits_{j=1}^n P(X^{(j)}=x^{(j)}|Y=c_k)}{\\sum\\limits_k  P(Y=c_k)\\prod\\limits^n_{j=1} P(X^{(j)}=x^{(j)}|Y=c_k)},k=1,2,...K$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面提到，将后验概率最大的类作为x的输出，则朴素贝叶斯分类器可表示为\n",
    "\n",
    "\n",
    "$$y = f(x) = arg\\max_{c_{k}}\\frac{P(Y=c_k)\\prod\\limits^n_{j=1} P(X^{(j)}=x^{(j)}|Y=c_k)}{\\sum\\limits_k  P(Y=c_k)\\prod\\limits^n_{j=1} P(X^{(j)}=x^{(j)}|Y=c_k)}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "$$y=\\arg \\max _{c_{k}} P\\left(Y=c_{k}\\right) \\prod_{j=1}^{n} P\\left(X_{j}=x^{(j)} | Y=c_{k}\\right)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  贝叶斯分类的实例\n",
    "\n",
    "<img src=\"https://raw.githubusercontent.com/liuzhaoo/markdown_pics/master/img/bayesti.png\" style=\"zoom:60%;\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"https://raw.githubusercontent.com/liuzhaoo/markdown_pics/master/img/by2.png\" style=\"zoom:77%;\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 朴素贝叶斯法的参数估计\n",
    "\n",
    "#### 极大似然估计\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "先验概率$P(Y=c_k)$ 的极大似然估计是\n",
    "$$P(Y=c_k) = \\frac{\\sum\\limits_{i=1}^N I(y_i=c_k)}{N}, k = 1,2...K$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "条件概率$P(X^{(j)} = a_{jl}|Y=c_k)$ 的极大似然估计值为\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$P(X^{(j)} = a_{jl}|Y=c_k) = \\frac{\\sum\\limits_{i=1}^N I(x_i^{(j)} = a_{jl},y_i=c_k)}{\\sum\\limits_{i=1}^N I(y_i = c_k)}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 贝叶斯估计\n",
    "先验概率的贝叶斯估计是\n",
    "$$P_\\lambda(Y=c_k) = \\frac{\\sum\\limits_{i=1}^N I(y_i=c_k) + \\lambda}{N+K\\lambda}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "条件概率的贝叶斯估计是\n",
    "$$P_\\lambda(X^{(j)} = a_{jl}|Y=c_k) = \\frac{\\sum\\limits_{i=1}^N I(x_i^{(j)} = a_{jl},y_i=c_k) +\\lambda}{\\sum\\limits_{i=1}^N I(y_i = c_k)+S_j\\lambda}$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 代码实现\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
